[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451
[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451nguyen-andrew wants to merge 8 commits intoredpanda-data:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an optional subject query parameter to the GET /schemas/ids/{id} endpoint to enable context-aware schema lookups. When provided, the subject name (which can include context in the format :.context:subject) is used to determine the context for retrieving the schema, rather than always using the default context.
Changes:
- Modified the endpoint handler to parse and use the optional
subjectparameter to extract context information - Updated the Python test client to support passing the
subjectparameter - Added comprehensive test coverage verifying the new parameter works correctly with default and non-default contexts
- Updated API documentation to describe the new query parameter
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/v/pandaproxy/schema_registry/handlers.cc | Added logic to parse the optional subject query parameter, extract context from it, and use the context when looking up schemas by ID |
| tests/rptest/tests/schema_registry_test.py | Updated test client method to accept subject parameter and added comprehensive test case covering various scenarios |
| src/v/pandaproxy/api/api-doc/schema_registry.json | Added documentation for the new subject query parameter |
6118d1f to
fa5356c
Compare
|
Force push to rebase on latest dev. |
CI test resultstest results on build#79806
test results on build#80163test results on build#80198
|
| auto subject_param = parse::query_param<std::optional<ss::sstring>>( | ||
| *rq.req, "subject"); | ||
|
|
||
| // Extract context from subject, or use default context |
There was a problem hiding this comment.
I think the behaviour is a bit trickier here unfortunately:
# Register the same schema in two contexts
% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.prod:Ad
dress/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%
% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.shared:Address/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%
# Query the schemas
% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:"
{"subject":":.shared:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688260232,"deleted":false}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%
% curl "http://localhost:8081/schemas/ids/1?subject=Address"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.:"
{"error_code":40403,"message":"Schema 1 not found"}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%
% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%I think we might need to treat the subject parameter differently depending on whether it contains only a context (empty subject) or a real subject.
To be honest, we might be able to get away with partial support of the parameter, by throwing if it contains a real subject, and not just a context. But if we can implement it fully, that would be great.
3a8f4d2 to
b6d4e37
Compare
b6d4e37 to
26fe7f5
Compare
|
Force push to address PR comments. |
pgellert
left a comment
There was a problem hiding this comment.
I think the AuthZ logic is not quite correct yet. I think the simplest behaviour we could implement here that is consistent with the earlier "allow the lookup if we have access through any subject" behaviour is that resolve_schema_across_contexts could look up both the schema definition as well as the list of subjects that would provide access to the schema, and then call the AuthZ handler only once from the handler for the full list of subjects that would provide access.
264bba4 to
31aaab3
Compare
Update context_subject::from_string to handle context-only strings.
Rename is_default_context() to is_default_context_only() to better reflect its behavior: it returns true only when the context is the default context AND the subject is empty (context-only). Add a new method is_non_default_context() that checks if a subject is in a non-default context. This will be used in future changes to handle subject query parameters.
This enables retrieving subjects filtered by a specific context.
31aaab3 to
07557bc
Compare
|
Force push to fix broken unit test and improve new ducktape tests. |
Previously, the GET /schemas/ids/{id} endpoint resolved schemas using
only the numeric ID without any context or subject scoping. This meant
clients could not target a specific context when retrieving a schema.
This change adds support for an optional `subject` query parameter,
parsed via context_subject::from_string(), that controls how the
schema ID is resolved:
- Default context with subject (e.g., "subj" or ":.:subj"): perform
an extended search: default context with subject first, then other
contexts with subject, then default context without the subject
restriction
- All other cases: resolve within a single context only
The schema definition is now retrieved using the resolved context,
ensuring the correct schema is returned in multi-context deployments.
Add test for the `subject` query parameter on GET /schemas/ids/{id},
verifying that it correctly extracts context for schema lookup.
Also update the test client to support the new parameter.
Extract shared ACL test infrastructure (setup, helpers) into a new SchemaRegistryAclAuthzTestBase class to enable reuse by other tests.
Add SchemaRegistryContextAuthzTest to verify ACL enforcement when using
context-qualified subjects and the subject query parameter with
GET /schemas/ids/{id}. The tests cover authorization scenarios including
literal and prefix ACLs on context-qualified subjects, cross-context
search behavior, and proper 403 responses to prevent information leakage.
07557bc to
0596bc4
Compare
|
Force push to fix bug. |
pgellert
left a comment
There was a problem hiding this comment.
I left a few comments, mainly minor points + nits, but the core behaviour is looking great. Good job on figuring out this one!
| if (ctx_sub.ctx == default_context && !ctx_sub.sub().empty()) { | ||
| vlog( | ||
| srlog.error, | ||
| "resolve_schema_id_simple cannot be called with default context " | ||
| "and non-empty subject"); | ||
| throw exception(error_code::internal_server_error); | ||
| } |
There was a problem hiding this comment.
nit: I'd probably use a vassert here to assert these kinds of pre-conditions that we expect to hold
| for (const auto& ctx : contexts | std::views::filter([](const auto& c) { | ||
| return c != default_context; | ||
| })) { |
There was a problem hiding this comment.
I'd move this c != default_context filtering into the loop as an if (...) { continue; }. I'm wondering if it is safe to iterate over an r-value range view while co_await'ing in the loop. Maybe it is, I don't know off the top of my hat and I'd have to think a bit harder about this to confirm.
| ctx, | ||
| subject()); | ||
| enterprise::handle_get_schemas_ids_id_authz( | ||
| rq, auth_result, {ctx_sub}); |
There was a problem hiding this comment.
nit: you could move the ctx_sub here, just like on L262.
| srlog.error, | ||
| "resolve_schema_id_extended should only be called with non-empty " | ||
| "subject"); | ||
| throw exception(error_code::internal_server_error); |
There was a problem hiding this comment.
Same point about vassert here
| @cluster(num_nodes=1) | ||
| def test_subject_param_with_authorized_subject(self): | ||
| """ | ||
| GET /schemas/ids/{id}?subject=sub1 succeeds when user has READ on sub1. |
There was a problem hiding this comment.
This is a subset of the next test (test_subject_param_unauthorized_despite_other_subject_access), so let's merge them to simplify.
| #include <algorithm> | ||
| #include <iterator> | ||
| #include <limits> | ||
| #include <optional> |
There was a problem hiding this comment.
Excellent commit message on this one!
| return is_context_only() && ctx == default_context; | ||
| } | ||
|
|
||
| bool is_non_default_context() const { return ctx != default_context; } |
There was a problem hiding this comment.
Is is_non_default_context() used after all (I can't see any usage)? Let's drop that commit if we don't need it.
| id, | ||
| ctx_sub.ctx, | ||
| ctx_sub.sub().empty() ? "" | ||
| : ss::sstring{", subject '"} + ctx_sub.sub() + "'"); |
There was a problem hiding this comment.
nit/fyi: you could use ss::format(", subject '{}'", ctx_sub.sub()) here
| result = self.sr_client.post_subjects_subject_versions( | ||
| subject="sub1", data=json.dumps({"schema": schema1_def}) | ||
| ) | ||
| assert result.status_code == requests.codes.ok |
There was a problem hiding this comment.
nit: I'd swap these asserts for self.assert_equal(...). The issue with these raw asserts is that when they fail, we don't see what the value of result.status_code was, which makes it more difficult to debug flaky tests.
Add an optional
subjectquery parameter to theGET /schemas/ids/{id}endpoint. This allows specifying the context for schema lookup by extracting the context from the provided subject name (e.g.,:.myctx:mysubject).Fixes CORE-15194
Backports Required
Release Notes